15 research outputs found

    An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes

    Get PDF
    This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The Voting-Experts algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four languages. The algorithm also segments time series of robot sensor data into subsequences that represent episodes in the life of the robot. We claim that Voting-Experts finds meaningful episodes in categorical time series because it exploits two statistical characteristics of meaningful episodes

    Finding the Intrinsic Patterns in a Collection of Time Series

    No full text

    Compactly representing parallel program executions

    No full text

    G-SteX: Greedy Stem Extension for Free-Length Constrained Motif Discovery

    No full text
    Abstract. Most available motif discovery algorithms in real-valued time series find approximately recurring patterns of a known length without any prior information about their locations or shapes. In this paper, a new motif discovery algorithm is proposed that has the advantage of requiring no upper limit on the motif length. The proposed algorithm can discover multiple motifs of multiple lengths at once, and can achieve a better accuracy-speed balance compared with a recently proposed motif discovery algorithm. We then briefly report two successful applications of the proposed algorithm to gesture discovery and robot motion pattern discovery.

    Compressing Microcontroller Execution Traces to Assist System Analysis

    No full text
    Part 4: Performance AnalysisInternational audienceRecent technological advances have made possible the retrieval of execution traces on microcontrollers. However, the huge amount of data in the collected trace makes the trace analysis extremely difficult and time-consuming. In this paper, by leveraging both cycles and repetitions present in an execution trace, we present an approach which offers a compact and accurate trace compression. This compression may be used during the trace analysis without decompression, notably for identifying repeated cycles or comparing different cycles. The evaluation demonstrates that our approach reaches high compression ratios on microcontroller execution traces

    A two-level structure for compressing aligned bitexts

    Get PDF
    A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations. Bitexts are very useful in linguistic engineering because they are used as source of knowledge for different purposes. In this paper we propose a strategy to efficiently compress and use bitexts, saving, not only space, but also processing time when exploiting them. Our strategy is based on a two-level structure for the vocabularies, and on the use of biwords, a pair of associated words, one from each language, as basic symbols to be encoded with an ETDC compressor. The resulting compressed bitext needs around 20% of the space and allows more efficient implementations of the different types of searches and operations that linguistic engineerings need to perform on them. In this paper we discuss and provide results for compression, decompression, different types of searches, and bilingual snippets extraction.Spanish projects TIN2006-15071-C03-01, TIN2006-15071-C03-02 and TIN2006-15071-C03-03. Regional Government of Castilla y León and the European Social Fund

    Epidemiological survey of workers exposed to cadmium: effect on lung, kidney, and several biological indices

    No full text
    Given a text, grammar-based compression is to construct a grammar that generates the text. There are many kinds of text compression techniques of this type. Each compression scheme is categorized as being either off-line or on-line, according to how a text is processed. One representative tactics for off-line..
    corecore